Mapping age- and sex-specific HIV prevalence in adults in sub-Saharan Africa, 2000–2018

Background Human immunodeficiency virus and acquired immune deficiency syndrome (HIV/AIDS) is still among the leading causes of disease burden and mortality in sub-Saharan Africa (SSA), and the world is not on track to meet targets set for ending the epidemic by the Joint United Nations Programme on HIV/AIDS (UNAIDS) and the United Nations Sustainable Development Goals (SDGs). Precise HIV burden information is critical for effective geographic and epidemiological targeting of prevention and treatment interventions. Age- and sex-specific HIV prevalence estimates are widely available at the national level, and region-wide local estimates were recently published for adults overall. We add further dimensionality to previous analyses by estimating HIV prevalence at local scales, stratified into sex-specific 5-year age groups for adults ages 15–59 years across SSA. Methods We analyzed data from 91 seroprevalence surveys and sentinel surveillance among antenatal care clinic (ANC) attendees using model-based geostatistical methods to produce estimates of HIV prevalence across 43 countries in SSA, from years 2000 to 2018, at a 5 × 5-km resolution and presented among second administrative level (typically districts or counties) units. Results We found substantial variation in HIV prevalence across localities, ages, and sexes that have been masked in earlier analyses. Within-country variation in prevalence in 2018 was a median 3.5 times greater across ages and sexes, compared to for all adults combined. We note large within-district prevalence differences between age groups: for men, 50% of districts displayed at least a 14-fold difference between age groups with the highest and lowest prevalence, and at least a 9-fold difference for women. Prevalence trends also varied over time; between 2000 and 2018, 70% of all districts saw a reduction in prevalence greater than five percentage points in at least one sex and age group. Meanwhile, over 30% of all districts saw at least a five percentage point prevalence increase in one or more sex and age group. Conclusions As the HIV epidemic persists and evolves in SSA, geographic and demographic shifts in prevention and treatment efforts are necessary. These estimates offer epidemiologically informative detail to better guide more targeted interventions, vital for combating HIV in SSA. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-022-02639-z.


Background
Four decades after its discovery, human immunodeficiency virus (HIV) continues to impact millions of people worldwide, remains one of the leading causes of morbidity and mortality globally [1, 2] and incurs billions of dollars annually in direct health care costs and indirect socioeconomic costs [3]. In sub-Saharan Africa (SSA) in 2019, an estimated 26 million people were living with HIV [2]. In recent years, international bodies have set goals to end the HIV epidemic: in 2014, the Joint United Nations Programme on HIV/AIDS (UNAIDS) introduced the "95-95-95" targets-that by 2030, 95% of people living with HIV globally would know their status, 95% of all people with diagnosed HIV infection would receive sustained antiretroviral therapy, and 95% of people living with HIV receiving antiretroviral therapy (ART) would be virally suppressed [4,5]. The United Nations Sustainable Development Goals also call for an end to the AIDS epidemic by 2030 [6]. Unfortunately, despite a significant increase in ART coverage over the last 20 years and major progress in terms of reductions in HIV incidence and mortality [1], the latest estimates and projections indicate that the world is not on track to meet these goals [2,7,8], and progress may stall further as a consequence of the COVID-19 pandemic [9].
Differences in HIV prevalence both within and between nations in SSA have been well-documented [10][11][12][13][14], as have differences between sexes [2,[12][13][14] and age groups [2]. These differences have also changed over time [1,10], impacted in part by the onset, duration, location, and demographic targeting of different prevention and treatment interventions [15][16][17]. Epidemiologically targeted interventions are understood to be more effective compared to homogeneous interventions [18] and are increasingly important at a time when the future of funding for HIV prevention and treatment is both uncertain and highly variable [19,20], particularly in the wake of disruptions related to the COVID-19 pandemic [21]. Evidence suggests that interventions are most effective when tailored to account for differences in the intensity of the epidemic by geographic location [14,22], sex [23], and age [24]. Locally and demographically precise HIV prevalence information, however, is necessary in order to maximize the benefit of such methods; at present, such information in SSA is lacking. HIV prevalence estimates stratified by age and sex are available at the national level through the Global Burden of Disease (GBD) [2] and from UNAIDS [25]. Both sources also provide subnational estimates at the first administrative level (e.g., province, state) in select countries. Recently, Dwyer-Lindgren et al. [10] presented aggregated adult HIV prevalence estimates for the years 2000-2017 at local scales in SSA, generalizing estimates for males and females combined, and across ages  years. Some studies have gone further to present subnational prevalence estimates separated by sex [26][27][28][29] or age [30]; however, these studies focused on single countries, and/or presented estimates for only one point in time, without describing any temporal trajectories in prevalence. To our knowledge, no previous studies have presented age-and sex-specific HIV prevalence estimates across SSA at local scales over time.
We built upon the HIV prevalence model from Dwyer-Lindgren et al. [10] to produce HIV prevalence estimates for 43 countries in SSA for males and females ages 15-59 years, stratified into nine 5-year age groups, for the years spanning 2000 to 2018. Countries, age groups, and time period were selected according to data availability. We expanded upon existing Bayesian spatiotemporal methods to model these estimates at a 5 × 5-km resolution and present them here aggregated to the second administrative level (which varies by country but is typically equivalent to e.g., districts, municipalities), which is the level typically considered most relevant to policymakers and stakeholders. Prevalence estimates for all demographic groups at all levels of geographic aggregation, as well as number of people living with HIV (count estimates), are publicly available from the Global Health Data Exchange (https:// ghdx. healt hdata. org/ record/ ihme-data/ sub-sahar an-africa-hiv-preva lencegeosp atial-estim ates-2000-2018) and through a userfriendly data visualization tool (http:// vizhub. healt hdata. org/ lbd/ hiv-prev-disagg).

Overview
This ecological study follows the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) [31] (Additional file 1: Section 1). This analysis relies secondary data sources to provide estimates of HIV prevalence on a 5 × 5-km grid in 43 countries in SSA for males and females ages 15-59 years residing at each location, stratified into five-year age bins (i.e., ages 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59), with annual resolution from year 2000 to 2018 inclusive, calibrated to national estimates from the GBD [2]. The period of 2000-2018 and the age range of 15-59 years were selected to optimize the contemporaneousness of the estimates and to account for data availability-there were relatively few large-scale seroprevalence surveys conducted before 2000, and most seroprevalence surveys focus on adults, with little reporting outside the 15-59 years age range. We produced estimates for sex rather than gender binaries because sex is more predominantly reported in the available data sources. Due to data availability limitations we were unable to produce prevalence estimates for sex minority individuals outside the male/ female binary. The 43 countries analyzed were also selected according to data availability-Mauritania was excluded as there were no HIV prevalence data available. We included six countries-Djibouti, Guinea-Bissau, Madagascar, Somalia, South Sudan and Sudanwhere no seroprevalence survey data were available, but where sentinel surveillance data collected from antenatal care clinic (ANC) attendees (described below) were available. The implications of these and other limitations are expanded upon in the "Methodological advantages and limitations" section in the "Discussion" section.
The methodology used here largely parallels that previously used to map adult HIV prevalence in SSA [10], with the incorporation of modifications necessary to model by age and sex, and improvements related to the inclusion of spatially aggregated data and ANC data (Fig. 1). We used a 5 × 5-km grid for consistency with this previous analysis; to align with the resolution available for pre-existing covariates incorporated in this analysis; and for flexibility in aggregating these estimates to other levels of interest (e.g., first-and second-level administrative subdivisions, such as states or districts, respectively, or more aggregated age groups such as reproductive ages [commonly ) using grid-cell-level estimates of age-and sexspecific population from Worldpop [32]. These population estimates were also used to estimate the number of people living with HIV in each demographic group. All analyses were conducted in R version 3.6.1 [33]. Figure 2 provides an overview of the analytic process, described in more depth below. Additional details are available in Additional file 1.

HIV data
We compiled a geolocated dataset of 304,672 observations from 91 seroprevalence surveys from 37 countries and 10,351 observations from sentinel surveillance among antenatal care clinic attendees (ANC data) in 43 countries (Additional file 2: Tables S1-S2; Fig. 1). Data from seroprevalence surveys were originally in the form of survey microdata (that is, individual-level survey responses) or survey reports (Additional file 2: Table S1). For surveys with available microdata, we extracted variables related to age, sex, HIV blood test result, location, and year, as well as survey weights, where available. We excluded rows with missing information on any of these variables, and subset the data to ages 15-59 years. For data coded by gender rather than sex, we treated these data as if they were sex-specific rather than gender-specific. We recognize that sex and gender are not interchangeable: sex is a biological variable, while gender is a fluid social construct. In the absence of quality data, however, we could not disaggregate estimates by gender at this time. After subsetting by age, we collapsed the age-specific data into 5-year age bins (hereafter referred to as "ages") by sex. We did this by calculating the weighted age-and sex-specific HIV prevalence at the finest spatial resolution available. Ideally, this was at the level of global positioning system (GPS) coordinates that represent the location of a survey cluster. In most Fig. 1 HIV prevalence data by region and country. a HIV seroprevalence survey data and b ANC sentinel surveillance data used in this analysis, by region and country. Color indicates the data source. AIS, AIDS Indicator Survey; DHS, Demographic and Health Survey; MICS, Multiple Indicator Cluster Survey; PHIA, Population-based HIV Impact Assessment Survey. Shape type indicates whether a data source is age-specific and has point (GPS) or polygon location information. Size indicates the relative effective sample size for each source. A full list of data sources with additional details about data type (such as survey microdata and survey reports) and geographical details are provided in Additional file 2: Tables S1-S5 surveys, GPS coordinates are randomly displaced (typically by 2-5 km depending on the setting and the survey series [34]) in order to protect respondent's confidentiality. In instances where GPS coordinates were not available, the smallest areal unit (termed a "polygon") possible was used instead. These typically represented an administrative subdivision. For surveys without microdata but for which estimates with some subnational resolution were provided in a report, we extracted these estimates with information about the sample size and location. GPS coordinates were not available for these reports, so these data were exclusively matched to polygons. In most reports, age ranges larger than 5 years were reported. Among these, we retained data reported for age ranges that corresponded exactly to one or more of the 5-year age bins used in this model; for example, we included surveys covering age ranges 15-49 years, or 15-24 years, but excluded those covering age ranges such as 18-24 years. For age-aggregated data, we retained information regarding the age range covered, to be used in our modeling process as described below. We also only included sex-specific data. For more information on excluded surveys see Additional file 2: Table S3.
Data that were spatially aggregated (i.e., polygon data) and/or age-aggregated required additional processing. Although we ultimately modeled HIV prevalence at the level of the observation, be it point or polygon, agespecific or age-aggregated, our modeling process initially specified HIV prevalence at the point-, time-, age-, and sex-specific level. Because of this, it was necessary that we disaggregate the age-aggregated and polygon survey data to be location-and age-specific. We did this by distributing polygon data to pixels proportional to population. Specifically, for each polygon, we generated points at the centroid of each 5 × 5-km pixel falling within that polygon and replicated that observation's HIV prevalence and sample size at the location of each of those centroids. Age-aggregated point data were Fig. 2 Analytical process overview. The process used to produce age-and sex-specific HIV prevalence estimates in sub-Saharan Africa involved three main parts. In the data-processing steps (green), data were identified, extracted, and prepared for use in the HIV prevalence model and in covariate models. In the modeling phase (orange), we used these data and covariates in a stacked generalization ensemble model and spatiotemporal Gaussian process model. In the post-processing phase (blue), we calibrated the prevalence estimation to match GBD 2019 estimates at the national level, aggregated prevalence estimates to the first-and second-level administrative subdivisions in each country, and calculated the number of people living with HIV (PLHIV) similarly disaggregated by replicating the HIV prevalence and sample size once for each year-age group covered in the overall age range. In the cases of age-aggregated polygon data, these two processes were combined. Next, each of the disaggregated, location-and age-specific rows of data associated with a given aggregated observation were assigned weights proportional to the age-and sex-specific population residing at that location for the given year, derived from WorldPop [32]. Weights per observation all summed to one. This process substantially increased the size of the dataset. To reduce the associated computational burden when fitting the model, in cases where at least one row within an observation was given a weight of less than half of one divided by the number of locations and/or ages in that observation, we successively dropped the lowest-weighted locations and/or ages until reaching a maximum of 1% of the observation's weight dropped. Remaining locations and/or ages within that observation were then reweighted to maintain a total weight of one. Data that were not aggregated (i.e., agespecific point observations) were each assigned a weight of one.
ANC data were primarily derived from national HIV estimate files developed by national teams and compiled and shared via UNAIDS [35] and supplemented with data derived from sentinel surveillance country reports (Additional file 2: Table S2). We extracted information from these sources on HIV prevalence and sample size by site and year. Sites were geolocated to specific GPS coordinates where possible and otherwise to a polygon that represents an administrative subdivision. The ANC data available for this analysis were not age-specific. Because ANC data included only pregnant females, we assumed the age range of these data to be that of females with non-zero fertility rates in SSA according to GBD 2019 [36], that is, females ages 15-54 years. We disaggregated ANC data to the age and location level as we did for age-aggregated or polygon survey data. However, specific locations and ages were weighted by number of births rather than population size. The number of births for a given age and location was estimated as the product of the location-, age-, and sexspecific population, again derived from WorldPop [32], and the national fertility rate, derived from GBD 2019 estimates [36].

Covariates
This analysis included the same covariates as the previous analysis [10]. This included five pre-existing covariates: (1) travel time to the nearest settlement of more than 50,000 inhabitants; (2) total population; (3) night-time lights; (4) urbanicity; and (5) malaria incidence (Additional file 2: Table S4). In addition, eight covariates were constructed explicitly for this analysis owing to their known association with HIV prevalence and data availability: (1) prevalence of male circumcision (all forms); (2) prevalence of self-reported sexually transmitted infection (STI) symptoms; (3) prevalence of marriage or living with a partner as married; (4) prevalence of one's current partner living elsewhere among females; (5) prevalence of condom use at last sexual encounter; (6) prevalence of reporting ever having had intercourse among young females; and (7) and (8) prevalence of multiple partners in the past year for males and for females, respectively. We updated the covariates constructed for this analysis to incorporate newly available data but utilized the original statistical methods (Additional file 1: Section 3.2; Additional file 2: Table S5; Additional file 3: Figs. S1-S8).

Covariate stacking
An ensemble covariate modeling approach ("stacking") was implemented to capture possible nonlinear interactions among the covariates across space and time [37]. In this approach, three sub-models were fitted to the HIV survey data with the covariates as explanatory predictors: generalized additive models [38], boosted regression trees [39], and lasso regression [40]. Each sub-model was fitted using fivefold cross-validation to avoid overfitting, and the out-of-sample predictions from across the five folds were compiled into a single set of predictions that were used to fit the geostatistical model described below. In addition, each sub-model was also fitted to the full dataset to generate a complete set of in-sample predictions that were subsequently used when generating predictions from the geostatistical model (Additional file 3: Figs. S9-S11). Because the covariates used here were neither age-specific nor (for most) sexspecific, we fit these sub-models at that same age-and sex-aggregated level as the HIV-specific covariates, modeling HIV prevalence data aggregated across ages 15-49 and males and females. The age range 15-49 years was used in this case because of its more common usage in seroprevalence surveys compared to the 15-59 years range, allowing us to retain more data for the stacking model. Polygon data were excluded from stacking models due to their incongruity with the configurations needed for the different sub-models. The ANC data were also excluded due to known sampling biases, which are described in the Additional file 1: Section 4.2.

Geostatistical model
This model was fit in Template Model Builder (TMB) [41]. Owing to computational constraints, and to allow for regional differences in the relationships between covariates and HIV prevalence, as well as differences in the temporal, spatial, and demographic autocorrelation in HIV prevalence, separate models were fitted for four regions (Additional file 3: Fig.  S12). We modeled HIV prevalence stratified by space, time, age, and sex using a generalized linear mixedeffects model. To simultaneously model point-and polygon-level observations, as well as both age-specific and age-aggregated observations, we specified the data likelihood at the observation level (i), which accommodated all of these. We modeled the number of HIV-positive individuals (Y i ) among a sample (N i ) for a given observation as a binomial variable: Logit-transformed prevalence was however first specified at the space, time, age, and sex-disaggregated level (j): We specified logit-transformed prevalence at the disaggregated level (p j ) as a linear combination of: • A regional intercept (β 0 ); • Covariates and associated regression parameters (β 1 X j ); • Random effects correlated across space and time, (Z 1, j ); • Random effects correlated across time, age, and sex, (Z 2, j ); • Country-specific (c) random effects correlated across age, (Z 3, c[j] ).
The random effects capturing correlations between space, time, age, and sex included: • Z 1, j : a Gaussian process with mean 0 and a covariance matrix given by the Kronecker product of a spatial Matérn covariance function [42] (Σ 1, space ) and a temporal first-order autoregressive covariance function (Σ 1, time ); • Z 2, j : a Gaussian Markov Random Field with mean 0 and a covariance matrix given by the Kronecker product of first-order autoregressive covariance functions for time (Σ 2, time ), age (Σ 2, age ), and sex (Σ 2, sex ); • Z 3, c[j] : a Gaussian Markov Random Field with mean 0 and a covariance matrix given by country-specific firstorder autoregressive covariance functions for age (Σ 3, c ).
We used the stochastic partial differential equation [43] approach to approximate the continuous spatiotemporal Gaussian random field (Z 1, j ). Sensitivity analyses were carried out to compare this model configuration to others with differing p j specification configurations, as well as to several other model and data specifications, and are described in detail in the Additional file 1: Section 4.3, Additional file 3: Figs. S13-S15, and the "Discussion" section. We then specified observation-level (i) prevalence: p i was calculated as the sum of disaggregated prevalence (p transformed, j ) estimates multiplied by their respective population (or in the case of ANC data, birth) weights (w j ), plus the incorporation of additional ANCrelated transformations and bias corrections (β 2 , U s[i] , and I ANC described below), and an observation-level uncorrelated error term (ϵ i ): In cases where data were already disaggregated spatially and by age, w j = 1.
HIV prevalence as measured by sentinel surveillance of ANC clinic attendees is known to be biased as a measure of HIV prevalence in the general adult female population [44], because it only covers pregnant females who attend ANC, compared to all adult females [45,46]. Additionally, fertility rates differ between HIV + and HIVfemales, with the exact relationship varying by age [47], thereby impacting age-specific ANC clinic visitation rates. To address this, for ANC data we transformed prevalence among pregnant females based on the underlying prevalence among all females and the age-specific fertility-rate ratio (HIV + fertility/HIVfertility). For ANC data, Fertility rate ratios (FRR j ) were derived from GBD 2019 fertility estimates [36], taken at the national level except in cases where subnational estimates were available (in Ethiopia, Nigeria, and South Africa). For survey data, To allow for additional ANC-related bias at the observation level (i), in instances where data in our model were derived from ANC sentinel surveillance (where I ANC = 1 for ANC data, and I ANC = 0 for all other data) our model incorporated a fixed term (β 2 ) that captured overall mean bias in the ANC data, and a random effect (U s[i] ) for a given ANC site s that captured spatial differences in the extent of this bias: Fitted model parameters are detailed in Additional file 2: Table S6. From each fitted model, we generated 1000 draws from the approximated joint posterior distribution of all model parameters and used these to construct 1000 draws of p j , setting I ANC to 0. Fivefold cross-validation was used to assess model performance and to compare a number of alternative models (Additional file 3: Figs. S13-S15). We also compared the re-aggregated adult-level estimates from our final model to those from the results of an age-and sex-aggregated counterpart (Additional file 3: Fig. S16).

Post-estimation
To take advantage of the more structured modeling approach and additional national-level data used by GBD 2019 [2], we performed post hoc calibration of our estimates to the corresponding national-level GBD estimates. For each country, year, age bin, and sex in our analysis, we defined a "raking factor" equal to the ratio of the GBD estimate for this country-year-age-sex to the population-weighted posterior mean HIV prevalence in all corresponding grid cells (Additional file 3: Figs. S17-S18). These raking factors were then used to scale each draw of HIV prevalence for each grid cell within that GBD geography, year, age, and sex. Point estimates for each grid cell were calculated as the mean of the scaled draws, and 95% uncertainty intervals were calculated as the 2.5th and 97.5th percentiles of the scaled draws. Grid cells that crossed international borders within modeling regions were fractionally allocated to multiple countries in proportion to the covered area during this process. In cases where subnational (i.e., first administrative level) estimates were available from the GBD, that is, for Ethiopia, Nigeria and South Africa, we calibrated to those estimates rather than those at the national level. Uncertainty in GBD estimates was not accounted for in this calibration.
In addition to estimates of HIV prevalence on a 5 × 5-km grid, we constructed estimates of HIV prevalence for first-and second-level administrative subdivisions. We did this by calculating age-and sex-specific population-weighted averages of prevalence for all grid cells within a given area. This process was carried out for each of the 1000 posterior draws (after calibration to GBD), with final point estimates derived from the mean of these draws and uncertainty intervals from the 2.5th and 97.5th percentiles. Additionally, estimates of the number of people living with HIV for a given age and sex in each grid cell were derived by multiplying estimated prevalence in each grid cell by the corresponding population estimate from WorldPop [32], which was also calibrated to match GBD 2019 [36] (Additional file 1: Section 4.4; complete estimates of people living with HIV are available along with all prevalence estimates at (https:// ghdx. healt hdata. org/ record/ ihmedata/ sub-sahar an-africa-hiv-preva lence-geosp atialestim ates-2000-2018)).
Although the model makes predictions for all locations covered by available covariates, all final model outputs for which land cover was classified as barren or sparsely vegetated according to European Space Agency Climate Change Initiative satellite data [48] and for which total population density was less than 10 individuals per 1 × 1-km in 2015 were masked for improved clarity when communicating with data specialists and policymakers. Maps were generated in R using the ggplot2 [49] package version 3.3.0.

Geographic variation
We found large differences in the spatial and demographic distribution of estimated HIV prevalence in SSA that were masked in demographically aggregated estimates (Figs. 3 and 4; Additional file 3: Figs. S19-S34). This was particularly striking among middle and older age groups. For example, in the year 2018, the maximum estimated HIV prevalence in any second-level administrative unit for adults ages 15-59 years was 35 Geographic variation within countries was also more dramatic in our demographically disaggregated results. Across SSA countries, the median absolute difference between second-level administrative units with the lowest and highest estimated prevalence within a given country in 2018 was 3.5 times greater when considered across ages and sexes, than when estimated for all adults combined (11.2 percentage points versus 3.2 percentage points). This difference in within-country prevalence range between demographically aggregated versus disaggregated estimates varied greatly between countries. For example, in Mozambique, this range across second-level administrative units was 30

Variation between males and females
Across SSA and across the years 2000-2018, estimated HIV prevalence was generally higher among females than males (Fig. 5  Maps reflect national boundaries, land cover, lakes, and population; areas with fewer than ten people per 1 × 1 km, and classified as barren or sparsely vegetated, are colored light gray. Countries colored in dark gray were not included in the analysis [2.1-21.4%] prevalence in females compared to 3.1% [0.8-8.1%] prevalence in males). Across Central SSA second-level administrative units, the median ratio between female and male estimated prevalence was 2.2, compared to the all-SSA median ratio of 1.6. The greatest absolute differences were seen in Eastern SSA, where the median absolute difference between female and male estimated prevalence was 1.9 percentage points in 2018, compared to the all-SSA median absolute difference of 0.9 percentage points. These differences between female and male prevalence in 2018 were less than those observed in the year 2000, when the median ratio between female and male estimated prevalence was 1.5, and the median absolute difference was 1.5 percentage points. We did not note substantial differences in within-country variations in prevalence between females and males in either 2000 or 2018 in any region. For complete comparisons between sexes by second-level administrative unit, including uncertainty estimates, see Additional file 4.

Variation between age groups
Prevalence within second-level administrative units was also highly variable across age groups (Fig. 6), and relative variation in prevalence between age groups in 2018 tended to be higher in males. Comparing estimated Relative uncertainty is defined as the ratio of the width of the 95% uncertainty interval to the mean estimate. Maps reflect national boundaries, land cover, lakes, and population; areas with fewer than ten people per 1 × 1 km, and classified as barren or sparsely vegetated, are colored light gray. Countries colored in dark gray were not included in the analysis prevalence across age groups within a given secondlevel administrative unit in 2018, the ratio between highest and lowest prevalence among age groups tended to be larger among males compared to females (median ratio across all SSA second-level administrative units of 14.4 for males, and 9.3 for females). For males, this ratio between highest and lowest estimated prevalence among age groups was smaller in Central SSA compared to other regions (median ratio of 8.3) and was largest in Western SSA (median ratio of 21.7). There was little regional difference for females. The sexes also differed in changes in this ratio between years, where it decreased over time for males (with a median ratio in 2000 of 52.7) but increased over time for females (median ratio in 2000 of 5.6). For complete age variation comparisons by second-level administrative unit, including uncertainty estimates, see Additional file 4.
Across SSA, the age group with the highest estimated prevalence in any given second-level administrative unit in 2018 was always between ages 35 and 54 years for males and between 30 and 49 years for females (Fig. 6) Within-country variation between second-level administrative units was relatively consistent across age groups. The ratio of maximum to minimum estimated prevalence among districts within each country was lowest for ages 35-39 years (median ratio of 4.3 across countries) and highest for ages 15-19 years (median ratio of 4.8 across countries) in 2018. Slightly larger differences were seen between age groups in Eastern and Southern SSA, with lower variation in middle-age groups and greater withincountry variation in younger age groups. The maximumto-minimum within-country prevalence ratio in Eastern SSA was lowest for adults ages 40-44 years (median ratio of 5.4 across Eastern SSA countries) and highest for adults ages 15-19 years (median ratio of 6.7 across Eastern SSA countries). These same age groups also represented the highest and lowest ratios in Southern SSA countries, with median values of 2.0 in adults ages 40-44 years and 2.8 in adults ages 15-19 years.

Variation over time
Estimated change in prevalence over time among all adults masked broad differences between specific age and sex groups ( Fig. 7; Additional file 3: Figs. S35-S40). Large temporal changes were much more common   6 Differences in prevalence between age groups in the year 2018 at the second administrative level, calculated as the ratio of estimated prevalence between the age groups with highest and lowest prevalence, for a males b and females; and the age groups with highest prevalence for c males d and females in 2018. Maps reflect national boundaries, land cover, lakes, and population; areas with fewer than ten people per 1 × 1 km, and classified as barren or sparsely vegetated, are colored light gray. Countries colored in dark gray were not included in the analysis The distribution of districts with large increases or decreases in prevalence over time also varied greatly by region. All regions saw a decrease of greater than 5. We found diverging overall trends between age groups over time, with greater decreases over time among younger age groups, and greater increases among older age groups. For example, for females ages 25-29 years, we found that estimated prevalence decreased by at least  years; and f males and g females ages 55-59 years. Maps reflect national boundaries, land cover, lakes, and population; areas with fewer than ten people per 1 × 1 km, and classified as barren or sparsely vegetated, are colored light gray. Countries colored in dark gray were not included in the analysis

Discussion
The results of this study, the first to present age-and sex-specific HIV prevalence estimates across sub-Saharan Africa at local scales, emphasize the interactions of geographic and demographic differences in HIV prevalence, going beyond previous research focused on either aspect individually. Just as previous work demonstrated how much geographic variability is masked in national prevalence estimates [10], we show here that demographically aggregated estimates mask important variation in the age and sex distributions of HIV prevalence at a local level, which in turn provide much clearer insights into the evolution of the HIV epidemic in SSA.
Many intervention methods are commonly used in the fight against the HIV epidemic, and variation in their efficacy and implementation has likely contributed to the prevalence trends presented here. Cost-efficiency is a consistent priority and is generally maximized by using targeted, integrated interventions [50]. For example, HIV prevention via behavioral and biomedical interventions based on local prevalence rates, HIV testing, and treatment initiation may be priorities for some age groups [51], while long-term ART retention and comorbidity care may require more emphasis for others [52]. Barriers to access to care often differ between geographic and demographic groups, where in some cases barriers may be logistical (e.g., geographic isolation and programmatic fragmentation [53]) or social (e.g., lack of information, stigmatization, homophobia [54]), and require different intervention methods. Males and females are also often targeted using different points of contact. For example, HIV testing has been recommended for all females attending antenatal care clinics [55], whereas for males the provision of self-, home-based, and mobile testing compared to facility-based testing may be more useful for testing and subsequent uptake of care [56][57][58]. Effective targeting of these interventions requires local, demographically specific HIV burden information, such as provided in the estimates presented here. Countries may similarly use this burden information to prioritize subnational and demographically specific treatment needs. This resource may also be useful in program evaluation efforts and thus aid the development of more successfully tailored interventions.
Variation in the social determinants driving HIV incidence and mortality, and thus HIV prevalence, are also an important consideration when assessing inequalities in HIV prevalence between locations and demographic groups. While prevalence among females is consistently higher than prevalence among males, for example, these differences can be attributed to different exposure to risk factors (such as age at first sex between males and females, marital status) in different countries [59]. In addition to understanding local patterns in HIV prevalence, effective interventions also need to consider, if not focus directly on, locally important risk factors and determinants of HIV infection and mortality [60,61].
Our estimates point to many local shifts in HIV prevalence over time. A multitude of factors can affect HIV prevalence trends at the local level over time, from local changes in prevention interventions to shifts in the overall demographics of an area, but one particularly important factor is local scale-up of ART [62,63]. Increases in ART coverage and reduced treatment costs have repeatedly been associated with large demographic shifts among people living with HIV [64] due to its success in reducing HIV mortality, leading to greatly increasing numbers of people living with HIV over the age of 50 years; our results reflect this trend. Given evidence pointing to differences between younger and older ART patients in rates of CD4 cell count decline [65], immune reconstitution rates [66], and risk of associated non-communicable diseases [67,68], among other health metrics [69], it is necessary that treatment plans for older patients be specifically tailored for their age group. Our results highlight those locations with large existing populations of people living with HIV for ages 50-59 years, and those seeing rapid growth of HIV prevalence in that demographic group. At the same time, the minimal change in estimated prevalence over time among the youngest age groups suggests that continued and even expanded efforts in HIV prevention for adolescents and young adults still need to be maintained as a priority across the continent.
Despite the significant progress made through this analysis in describing HIV burden in SSA, prevalence estimates mask complex and varied relationships between HIV incidence and mortality, as well as migration and seasonal mobility. It is difficult to determine, for example, if a dramatic decrease in HIV prevalence in an area is due to reduced incidence, increased mortality, or differences in the immigration and emigration rates of HIV + and HIVindividuals. Primary data for all three of these metrics are not widely available for SSA, adding additional complexity to the interpretation of our estimates. Importantly, no estimates of these indicators are consistently available at local scales for specific demographic groups. Furthermore, local data related to diagnosis, treatment, and viral suppression rates are also limited, despite these metrics lying at the heart of the UNAIDS 95-95-95 goals [4]. While very informative, difficulties can still arise in intervention decision-making built around HIV prevalence estimates alone, without understanding their underlying drivers. Improved surveillance of HIV prevalence, incidence, and mortality, combined with reliable population and migration estimates and information on local programs, are necessary to fully understand the complexities of the region's HIV epidemic. Clearly, even with the development of more comprehensive burden information, any modeled estimates should only be used for intervention purposes in conjunction with local program knowledge.

Methodological advantages and limitations
The methods used in this analysis build upon those previously used by Dwyer-Lindgren et al. to model adult HIV prevalence [10]. While this analysis does improve upon and have advantages over the previous methods in some ways, it faces some of the same, as well as some new limitations. As with the previous study, and as with all modeling studies, the quality of our estimates is highly dependent on the quality and coverage of our input data. Despite constructing a large database of HIV prevalence data, coverage gaps and small sample sizes in some locations can be associated with imprecision and/ or large uncertainty intervals in some of our prevalence estimates (Additional file 3: Figs. S27-S34). Additionally, the location information associated with the data compiled for this analysis is subject to some error. In order to protect respondent confidentiality, most surveys that collect GPS coordinates perform some type of random displacement on those coordinates prior to releasing data for secondary analysis: for example, GPS coordinates for Demographic and Health Surveys (DHS) are displaced by up to 2 km for urban clusters, up to 5 km for most rural clusters, and up to 10 km in a random 1% of rural clusters [34]. Past research has found that displacement can degrade the predictive power of a geostatistical model, however this effect was found to be modest, and researchers concluded that relatively accurate mapping can be undertaken at a 5 × 5-km resolution even with GPS displacement [70].
The approximate integration method we use in this analysis better handles uncertainty estimation and easily accommodates not only polygon data but age-aggregated data as well, compared to the polygon resampling method that has been used elsewhere [10,71,72]. At the same time, given the large number of dimensions being modeled, as well as the high data input count produced by our data disaggregation technique, we found that current matrix packages, as well as our computational facilities, could not accommodate a Gaussian process that accounted for the covariance of a complete spacetime-age-sex Kronecker product. We therefore focused on the interactions between space, time, age, and sex that we believed would be most relevant in terms of capturing important variability in these dimensions, within our computational abilities. Our modeling strategy also assumed no difference in the probability that an HIV + versus an HIVpregnant woman would access antenatal care and therefore be included in ANC surveillance.
Due to limited data availability, we delineated estimates in this analysis using a male/female binary. We recognize that this approach does not allow for investigation of HIV prevalence among gender and sex diverse people, despite the disproportionate burden of HIV commonly seen among these populations [73]. Further, we recognize that many data sources do not provide the option to select a sex other than "male" or "female, " gender options beyond "man" or "woman, " and often conflate gender with sex. In the future, we hope that high-quality data on HIV prevalence for gender and sexual diverse people will be more widely available, so we can produce estimates beyond females and males.
We note that our results include unprecedentedly high prevalence estimates for certain population subsets. In most cases, we do not believe these estimates are implausible. For example, we estimated prevalence among middle-and older-aged females to be up to 59.2% [45.9-73.0%] in Umgungundlovu in KwaZulu-Natal, South Africa in 2018. Previous research has estimated prevalence for females adults of all ages combined in Umgungundlovu in 2017 to be 46.6% [43.8-49.5%] [74]. As we have shown that prevalence in middle-and older-aged females tended to be higher than all-ages prevalence, we believe our estimates for middle-and older-aged females during this time period in this location to be reasonable, especially with uncertainty intervals taken into consideration. In rare cases, however, our methods yielded estimates which we were unable to support through the literature. For example, for males ages 35-39 and 40-44 years in Nyatike in Migori, Kenya, we estimated prevalence in the year 2000 to be 77.8% [50.2-100.0%] and 78.7% [50.0-100.0%], respectively. It is unlikely true prevalence in that area and year was this high (though given the large uncertainty intervals associated with these values, it is probable that true prevalence does fall within those ranges). We note, however, that the high estimates in this area and surrounding second-level administrative units were predominantly associated with the earlier years in our time series-we believe the more recent estimates in Nyatike to be more realistic [75]. In these locations, decreases in prevalence over time may therefore also be overestimated. These instances were rare.
A combination of data limitations and model complexity ultimately led to large uncertainty intervals around our estimates. Given that our 95% coverage estimates in model validation were consistently higher than expected (Additional file 3: Figs. S14-S16), this indicates that these uncertainty intervals may be larger than appropriate. Wide uncertainty can limit the utility of our estimates in terms of informing HIV policies, and reducing this uncertainty through improved data coverage will be an important consideration in future iterations of this model. We were also unable to account for all sources of uncertainty such as uncertainty in the WorldPop estimates used in many stages of our modeling and estimation processes and uncertainty in covariates.

Conclusions
HIV continues to impose enormous human and financial costs [3] on SSA, decades since its emergence. Financial and logistical disruptions and discontinuities due to the impacts of COVID-19, as well as changes in ART adherence, are likely to present new barriers [21,76] to the UNAIDS 95-95-95 goals [4]. This analysis provides important insight into the nuances of HIV burden in SSA, offering information that is critical to the development of targeted interventions.
Additional file 1: Supplemental information.1. Compliance with the Guidlines for Accurate and Transparent Health Estimates Reporting (GATHER). 2. HIV data sources and data processing. 3. Covariate and auxiliary data. 4. Statistical model. 5. References.
Additional file 2: Supplemental tables. Table S1. HIV seroprevalence survey data. Table S2. ANC sentinel surveillance data. Table S3. HIV and covariates surveys excluded from this analysis. Table S4. Sources for preexisting covariates. Table S5. HIV covariate survey data. Table S6. Fitted model parameters. Figure S1. Prevalence of male circumcision. Figure S2. Prevalence of signs and symptoms of sexually transmitted infections. Figure S3. Prevalence of marriage or living as married. Figure S4. Prevalence of partner living elsewhere among females. Figure S5. Prevalence of condom use during most recent sexual encounter. Figure S6. Prevalence of sexual activity among young females. Figure S7. Prevalence of multiple partners among males in the past year. Figure S8. Prevalence of multiple partners among females in the past year. Figure S9. HIV prevalence predictions from the boosted regression tree model. Figure S10. HIV prevalence predictions from the generalized additive model. Figure S11. HIV prevalence predictions from the lasso regression model. Figure S12. Modeling regions. Figure S13. Age-and sex-specific vs. adult prevalence modeling. Figure S14. Data sensitivity. Figure S15. Model specification validation. Figure S16. Modeled and re-aggregated adult prevalence comparison. Figure S17. HIV prevalence raking factors for males. Figure S18. HIV prevalence raking factors for females. Figure S19. Age-specific HIV prevalence in males, 2000. Figure   S20. Age-specific HIV prevalence in females, 2000. Figure S21. Agespecific HIV prevalence in males, 2005. Figure S22. Age-specific HIV prevalence in females, 2005. Figure S23. Age-specific HIV prevalence in males, 2010. Figure S24. Age-specific HIV prevalence in females, 2010. Figure  S25. Age-specific HIV prevalence in males, 2018. Figure S26. Age-specific HIV prevalence in females, 2018. Figure S27. Age-specific uncertainty interval range estimates in males, 2000. Figure S28. Age-specific uncertainty interval range estimates in females, 2000. Figure S29. Age-specific uncertainty interval range estimates in males, 2005. Figure S30. Agespecific uncertainty interval range estimates in females, 2005. Figure S31. Age-specific uncertainty interval range estimates in males, 2010. Figure  S32. Age-specific uncertainty interval range estimates in females, 2010. Figure S33. Age-specific uncertainty interval range estimates in males, 2018. Figure S34. Age-specific uncertainty interval range estimates in females, 2018.

Funding
This work was primarily supported by grant OPP1132415 from the Bill & Melinda Gates Foundation. The funder of the study had no role in study design, data collection, data analysis, data interpretation, writing of the report, or decision to publish. The corresponding authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Availability of data and materials
The findings of this study are supported by data available in public online repositories and data publicly available upon request of the data provider. Details regarding the data sources used and their availability can be found in Additional file 2: Supplemental Tables 1-5 and online via the Global Health Data Exchange (https:// ghdx. healt hdata. org/ record/ ihme-data/ sub-sahar an-africa-hiv-preva lence-geosp atial-estim ates-2000-2018). Estimates can also be further explored through the Global Health Data Exchange, as well as via our online visualization tool (http:// vizhub. healt hdata. org/ lbd/ hiv-prevdisagg). Administrative boundaries were modified from the Database for Global Administrative Areas (GADM) dataset [77]. Populations were retrieved from WorldPop [32]. This study complies with the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) recommendations [31]. All maps and figures presented in this study are generated by the authors; no permissions are required for publication. All computer code is available online and can be found at (https:// github. com/ ihmeuw/ lbd/ tree/ hiv_ prev-africa-2020).

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable. ; grants or contracts from National Institutes of Health and Firland Foundation as payments to their institution; consulting fees from United States Agency for International Development as personal payments, and from KNCV Tuberculosis Foundation as payments to their institution; all outside the submitted work. E Rubagotti reports payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing, or educational events from the Greenwich China Office and Unviersity Prince Mohammad VI, Morocco, all outside the submitted work. B Sartorius reports grants or contracts from DHSC -GRAM Project; Leadership or fiduciary role in other board, society, committee or advocacy group, paid or unpaid, as a member of the GBD Scientific Council and a Member of WHO RGHS; all outside the submitted work. J A Singh reports consulting fees from Crealta/Horizon, Medisys, Fidia, PK Med, Two labs Inc, Adept Field Solutions, Clinical Care options, Clearview healthcare partners, Putnam associates, Focus forward, Navigant consulting, Spherix, MedIQ, Jupiter Life Science LLC, UBM LLC, Trio Health, Medscape, WebMD, and Practice Point communications, and the National Institutes of Health and the American College of Rheumatology; payment or honoraria for participating in the speakers bureau for Simply Speaking; support for attending meetings and/or travel from the steering committee of OMERACT, to attend their meeting every 2 years; participation on a Data Safety Monitoring Board or Advisory Board as an unpaid member of the FDA Arthritis Advisory Committee; leadership or fiduciary role in other board, society, committee or advocacy group, paid or unpaid, as a member of the steering committee of OMERACT, an international organization that develops measures for clinical trials and receives arm's length funding from 12 pharmaceutical companies, with the Veterans Affairs Rheumatology Field Advisory Committee as Chair, and with the UAB Cochrane Musculoskeletal Group Satellite Center on Network Meta-analysis as a director and editor; stock or stock options in TPT Global Tech, Vaxart pharmaceuticals, Atyu Biopharma, Adaptimmune Therapeutics, GeoVax Labs, Pieris Pharmaceuticals, Enzolytics Inc, Series Therapeutics, Tonix Pharmaceuticals, and Charlotte's Web Holdings Inc. and previously owned stock options in Amarin, Viking, and Moderna pharmaceuticals; all outside the submitted work. N Taveira reports grants or contracts from FCT and Aga Khan Development Network (AKDN) -Portugal Collaborative Research Network in Portuguese speaking countries in Africa (Project reference: 332821690) and from European & Developing Countries Clinical Trials Partnership (EDCTP), UE (Project reference: RIA2016MC-1615), as payments made to their institution, all outside the submitted work.